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1 Introduction 



Partial linear models have attracted lots of attention due to their flexibility to combine 
traditional linear models with nonparametric regression models. See, e.g. Heckman (1986), 
Rice (1986), Chen (1988), Bhattacharya and Zhao (1997), Xia and Hardle (2006), and the 
recent comprehensive books by Hardle, Gao, and Liang (2000) and Ruppert, Wand and 
Carroll (2003) for additional references. However, the nonparametric components are subject 
to the curse of dimensionality and can only accommodate low dimensional covariates X. To 
remedy this, a dimension reduction model which assumes that the influence of the covariate 
X can be collapsed to a single index, X'^P, through a nonparametric link function is a 
viable option and termed the partial-linear single-index model. Specifically, it takes the form: 

Y = Z^eo + g{X^Po)+e, (1.1) 

where {X, Z) & HP x R'^ are covariates of the response variable Y, g is an unknown link 
function for the single index, and e is the error term with E{e) =0 and < Var(e) = cx^ < oo. 
For the sake of identifiability, it is often assumed that \\/3q\\ = 1 and the rth component of 
/So is positive, where || ■ || denotes the Euclidean metric. 

This model is quite general, it includes the aforementioned partial-linear model when 
the dimension of X is one and also the popular single-index model in the absence of the 
linear covariate Z. There is an extensive literature for the single-index model with three 
main approaches: projection pursuit regression (PPR) [Friedman and Stuetzle (1981), Hall 
(1989), Hardle, Hall and Ichimura (1993)]; the average derivative approach [Stoker (1986), 
Doksum and Samarov (1995), and Hristache, Juditsky and Spokoiny (2001)]; and sliced 
inverse regression (SIR) and related methods [Li (1991), Cook and Li (2002), Xia, Tong, Li 
and Zhu (2002), and Yin and Cook (2002)]. All these approaches rely on the assumption 
that the predictors in X are continuous variables, while model fll.ip compensates for this by 
allowing discrete or other continuous variables to be linearly associated with the response 
variable. To our knowledge, Carroll, Fan, Gijbels and Wand (1997) were the first to explore 
model f 1 1.1 1) and they actually considered a generalized version, where a known link function 
is employed in the regression function while model (11.11) assumes an identity link function. 
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However, their approaches may become computationally unstable as observed by Yu and 
Ruppert (2002) and confirmed by our simulations in Section 3. The theory of Carroll, Fan, 
Gijbels and Wand (1997) also relies on the strong assumption that their estimator for 6q is 
already y^-consistent. Yu and Ruppert (2002) alleviated both difficulties by employing a 
link function g which falls in a finite-dimensional spline space, yielding essentially a fiexible 
parametric model. Xia and Hardle (2006) used a method that is based on a local polynomial 
smoother and a modified version of least squares in Hardle, Hall and Ichimura (1993). 

In this paper, we propose a new estimation procedure. Our approach requires no iteration 
and works well under the mild condition that a few indices based on X suffice to explain Z. 
Namely, 

Z = (P{X^(3z)+V, (1-2) 

where 0(-) is an unknown function from to R^, (3z is, a p x d matrix with orthonormal 
columns, f] has mean zero and is independent of X. The dimension d is often much smaller 
than the dimension p of X. Such an assumption is not stringent and common in most 
dimension reduction approaches in the literature. A theoretical justification is provided in 
Li, Wen and Zhu (2008). Model (11.21) implies that a few indices of X suffice to summarize 
all the information carried in X to predict Z, which is often the case in reality, such as 
for the Boston Housing data in section 4, where a single index was selected for model (1.2) 
and Z is a discrete variable. In this data, first analyzed in Harrison and Rubinfeld (1978), 
the response variable is the median value of houses in 506 census tracts in the Boston area. 
The covariates include: average number of rooms, the proportion of houses built before 
1940, eight variables describing the neighborhood, two variables describing the accessibility 
to highways and employment centers, and two variables describing air pollution. A key 
covariate of interest is a binary variable that specifies whether a house borders the river or 
not. Our analysis presented in Section 4 based on the dimension reduction assumptions of 
(II. ip and with Z equal to this binary variable in (11.21) demonstrates the advantages of our 
model assumption, only one index {d = 1) was needed in model (ll.2p for this data. 

To avoid the computational complications that we experienced with the procedure in 
Carroll et al. (1997), who aim at estimating Pq and 9o simultaneously, we choose to estimate 
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(3o and sequentially. The idea is simple: Oq can be estimated optimally through approaches 
developed for partial linear models once we have a ^/n estimate of (3q and plug it in (1.1). 
However, j3o and Oq may be correlated, leading to difficulties in identifying jSo- This is where 
model (1.2) comes in handy, as it allows us to remove the part of Z that is related to X so 
that the residual 1] in (1.2) is independent of X. Again, we need to impose the identifiability 
condition that (3z has norm one and a positive first component. The procedure is as follows: 
First estimate (3z via any dimension reduction approach, such as SIR or PPR for q = 1, 
and the projective resampling method in Li, Wen and Zhu (2008) for g > 1. Once (3z has 
been estimated we proceed to estimate via a (i- dimensional smoother and then obtain the 
residual for rj. Since r] = Z — (f){X^Pz), plugging this into (II. ip we get 

Y = r]^eo + h{X^(3o,X^0z) + e, 

where h is an unknown function, but now rj and X are independent of each other. It is thus 
possible to employ a least squares approach to estimate and the resulting estimate will 
be -y/n-consistent. We then employ a dimension reduction procedure to y — Z'^Oq and X to 
obtain an estimate for jSo and g. This concludes the first stage , where the resulting estimates 
for 6o and jSo are already ^/n consistent but will serve the role as initial estimates for the next 
stage, where we update all the estimates but use a more sophisticated approach. Specifically 
for we apply the profile method, also called partial regression in Speckman (1988), to 
estimate ^o- Theoretical results in Section 2.2 indicate that the two-stage procedure is fully 
efficient, so there is no need for iteration. More importantly, to estimate the index Pq, we use 
an estimating equation to obtain asymptotic normality, which takes the constraint \\Po\\ = 1 
into account. The estimator based on this new estimating equation performs better in several 
ways, summarized as follows. 

1. Our estimation procedure directly targets the model parameters Oq, Pq, Pz, 0(') and 
g{-) and no iteration is needed. 

2. We obtain the asymptotic normality of the estimator of /3q and the optimal convergence 
rate of the estimator of g{-), as well as the asymptotic normality of the estimator of 
6q. The most attractive feature of this new method is that the estimator of (3q has 
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smaller limiting variance when compared to three existing approaches in : Hardle et 
al. (1993) when the model is reduced to the single-index model, Carroll et al.(1997) 
if their link function is the identity function, and Xia and Hardle (2006) when their 
model is homoscedastic. This is the first result providing such a small limiting variance 
in this area. 

3. We also provide the asymptotic normality of the estimator of cr^. It allows us to 
consider the construction of confidence regions and hypothesis testing for Oq and f3o. 

The rest of the paper is organized as follows. In Section 2, we elaborate on the new 
methodology and then present the asymptotic properties for the estimators. Section 3 reports 
the results of a simulation study and Section 4 an application to a real data example for 
illustration. Section 5 gives the proofs of the main theorems. Some lemmas and their proofs 
are relegated to the Appendix. 

2 Methodology and Main Results 

2.1 Estimating Procedures 

The observations are {{Xi, Yi, Zi);l < i < n}, a sequence of independent and identically 
distributed (i.i.d.) samples from (II. ip . i.e. 

Y, = Zjeo + g{X^Po)+ei, t = l,...,n, 

where ei, ■ ■ ■ , are i.i.d. random errors with E{ei) = and Var(ej) = cr^ > 0, {si] 1 < z < n} 
are independent of {(Xj, Zi);! <i < n}, Xi = {Xa, . . . , Xip)^, Zi = {Zn, . . . , Zig)^, Pq e EP 
and 6*0 G R^. For simplicity of presentation, we initially assume that Z can be recovered 
from a single-index of X. That is, = 1 in (11.21) . The general case will be explored at the 
end of this section in Remarks 2. Below we first outline the steps for each stage and then 
elaborate on each of these steps. 

Algorithm for Stage One: 
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1. Apply a dimension reduction method for the regression of Zi versus Xj to find an 
estimator /3z of fiz\ 

2. Smooth the Z, over Xj I5z to get an estimator (^(■) of </>(■), then compute the residuals 

3. Perform a linear regression of Yi versus f}j's to find an initial estimator of ^o; 

4. Apply a dimension reduction method to the regression of Yi — ZjQi^ versus Xi to find 
an initial estimator /5o of /5o 

5. Smooth the Yi — ZJOq versus the XfPo to obtain an estimator for g and for its deriva- 
tive g'. 

Algorithm for Stage Two: 

6. Use the initial estimate /3o from Step 4 to update the estimate of through a profile 
approach for the partial linear model by minimizing (12.51) . 

7. Use the updated estimate 9 of from Step 6 to form the new residual Y — Z^9, then 
update the estimate of /?o by solving the estimating equation fl2.10l) . 

8. Use the updated estimates of 9o and Po in Steps 6 and 7 to update the estimate of g, 
following the procedure as described in Step 5. 

This completes the algorithm and, as we show in Section 2.2, the resulting estimators 
are already theoretically efficient. However, the practical performance can be improved by 
iterating Steps 6 and 7 one or more times. Our experience, through simulation studies not 
reported in this paper, reveals limited benefits when iterating more than once. 

Next, we elaborate on each of the steps in the above algorithms for the simple case 
of a single index [d = 1). For the dimension reduction method in Step 4, one can use 
any of several existing methods, such as SIR or one of its variants, PPR, or the minimum 
average variance estimator (MAVE) of Xia, Tong, Li and Zhu (2002). These methods are 
for univariate responses and hence can also be applied in Step 1 when q = 1. However, 
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when q > 1, a different metfiod is needed in Step 1 for the case of a multivariate response, 
and we recommend the dimension reduction method in Li, Wen and Zhu (2008). This and 

other results in the literature already demonstrate the \/n-consistency of these dimension 
reduction methods. 

For the smoothing involved in Step 5, one can choose any one-dimensional smoother. We 
employ the local polynomial smoother (Fan and Gijbels, 1996) to obtain estimators of the 
link function g and its derivative g', which will be used in the second stage of the estimation 

procedure. Specifically, for a kernel function K{-) on and a bandwidth sequence b = bn, 
define Ki,{-) = b~^K{-/b). For a fixed /3 and 6, the local linear smoother aims at minimizing 
the weighted sum of squares 

n 

^ [Yi - Zje -do- diiXjp - t)fK,{Xj(5 - t) 
1=1 

with respect to the parameters d^,, u = 0,1. Let h = hn and hi = hin denote the bandwidths 
for estimating g{-) and g'{-), respectively. A simple calculation shows that the local linear 
smoother with these specifications can be represented as 

n 

g{t; P,9)^Y1 Wniit, PWi - Zj9), (2.1) 



i=l 



and 



where 



g'{t; (5,9) ^ni{t, P){Y, - Zj9), (2.2) 



i=l 



and 



W u. _ Khix:fP - t)[Sn,2it; h) - {Xfp - t)S^,i{t- P, h)] 

W (f - ^^^(^^(^ - - ^)^"-°(^' ^' - ^".^(^^ (OA) 

1 ^ 

Sn,i{t; P,h) = -Y: {xjp - tfMxJp -t), / = 0, 1, 2. 
^ 1=1 

The above estimators are for generic fixed values of /5 and 9. To obtain the estimates 
needed in Step 5, one replaces them with the initial values (3o obtained in Step 1 and 9o 
obtained in Step 3, respectively. We will show in Theorem 2 that this results in standard 
convergence rates for the estimate of g. 



Likewise, a local linear smoother can be employed in Step 2 for estimating the unknown 
function in model fll.2p . The resulting estimator is defined as 

n 

Ht■Jz)=T.Wn^it■Jz)Z... 

i=l 

Several possibilities are available for the estimator of iii Step 6, such as the profile 
approach (termed "partial regression" in Speckman, 1988) or the partial spline approach 
(Heckman, 1986). Here the the partial spline approach is not suitable for correlated X and 
Z, so we adopt a profile approach and a local linear smoother. In short, this amounts to 
minimizing, over all 6, the sum of squared errors, 

n 

J2[Y,-Zj9-g{X^PoJo,e)f, (2.5) 

i=l 

where g is the estimator in (12.11) of g, obtained by smoothing Yi — ZjO versus I3q, and (3q 
is an initial estimator of /3o, which could be the initial estimator /3o in Step 4 or the refined 
estimator from Step 7 when an iterated estimator for is desirable. Because this smoother 
is expressed as a function of 9, the estimate derived from (12.51) is a profile estimate. More 
details about the derivation and advantages of the profile approach can be found in Speckman 
(1988). Specifically, let /3o be the current estimator, Y = (Yi, . . . , Fn)^, Z = (Zi, . . . , Z^)""", 
where 

= - h{xjk-, /3o), Zi = z,- uxjk; k). 

hit-, Po) = Er=i W^t; Po)Y,, hit; /3o) = ELi W^™(t; 

with gi and h the respective estimators of giit) = E{Y\X'^Pq = t) and g2{t) = E{Z\X^Pq = 
t). The resulting partial regression estimator is thus 

§ = (Z^Z)-iZ^Y. (2.6) 

For the estimator of jSo in Step 7, we propose a novel method that takes advantage of the 
constraint ||/5o|| = 1 and hence is more efficient than existing approaches, including the 
PPR approach in Hardle et al (1993), the MAVE method in Xia et al. (2002), and the 
least squares approaches of Carroll et al (1997) and Xia and Hardle (2006) for the single- 
index partial linear model in (11.11) . It is worth mentioning that Xia and Hardle (2006) 
allow possible heteroscadestic structure in (II. ip . and least squares approaches have been 
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standard dimension methods and lead to the same asymptotic variances for estimators of 
(3o- For instance, in the homoscadestic case, the estimator in Xia and Hardle (2006) has an 
asymptotic variance that is identical to that of Hardle et al (1993). Our approach, based 
on an estimating equation under the constraint ||/3o|| = 1, is computationally stable and 
asymptotically more efficient, i.e., its asymptotic variance is smaller. The efficiency gain can 
be attributed to a re-parametrization, making use of the constraint \\Po\\ = 1 by transferring 
restricted least squares to un-restricted least squares, which makes it possible to search for 
the solution of the estimating equation over a restricted region in the Euclidean space Rp~^. 

Without loss of generality, we may assume that the true parameter (3q has a positive 
component (otherwise, consider —(3o), say /?or > for (3q = {Poi, . . . , Pop)^ and 1 < r < p. 
For P = {Pi, . . . , Pp)^, let P^^^ = {Pi, . . . , Pr-i, Pr+i, • • • 5 Pp)^ be a p—1 dimensional parameter 
vector after removing the rth component Pr in p. Then we may write 

P = P{P^^^) = {Pi, Pr-l, (1 - |^)^/^ Pr+l, . . . , Ppf. (2.7) 

The true parameter pI[^ must satisfy the constraint H/^o^"*!! < 1, and P is infinitely differen- 
tiable in a neighborhood of P^f^ This "remove-one- component" method for P has also been 
applied in Yu and Ruppert (2002). 

To obtain the estimator, consider a Jacobian matrix of P with respect to P^^\ 

dp 

V) = = (7i.---.7p)^, (2.8) 

where 7^ (1 < s < p, s 7^ r) is a p — 1 dimensional unit vector with sth component 1, and 
7r = —(1 — ll/?*^'"-' Ip)""'^/^/?*^''^ To motivate the estimating equation, we start with the least 
squares criterion: 

n 

D{P) := Y: [Y^ - Zje - g{Xjp- P, e)]\ (2.9) 
1=1 

From (I23!) and (Ej]) we find D{p) = D{p{p^'''^)) = D(/3W). Therefore, we may obtain an 
estimator of Pq'\ say P^'^\ by minimizing i)(/3(^)), and then obt ain an estimator of Po, P, via 
a transformation. This means that we transform a restricted least squares problem to an 
unrestricted least squares problem by solving the estimation equation: 

n 

^ [Y, - Zje - g{Xjp- P, e)]g'{Xjp- P, ^) JJmX, = 0. (2.10) 

i=l 
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We define the resulting estimator j3 of /?o the final target estimator. Theorem 3 implies 
that our estimator for jSo has a smaller limiting variance than the estimators in Xia and 
Hardle (2006) and Carroll et al. (1997). 

With 9 and /3, the final estimator g* of g in Step 8 can be defined by 

n 

g*\t) := git; /3, ^) = E ^-(^^ ' ZjO), 

i=l 

and the estimator (T^ of (x^ by = ^1]"=! l^i " ^I^ ~ 9*i^lP)f- Asymptotic results for 
the final parameter estimates of 6 and (3 are established in Theorem 1 and Theorem 2, and 
results for the link estimate of g follow from Theorem 4. 

Remark 1 We consider a homoscedastic model of (1.1) with d = 1 in model fll.2p . While 
the estimation procedure can be extended easily to heteroscedastic errors, an additional 
dimension reduction assumption on the variance function of of rj, given X, is needed to avoid 
the curse of high dimensional smoother needed in Step 2 to estimate 0. This assumption 
requires that this variance function is also a function of a few indices based on X. Moreover, 
the extension of asymptotic theory is not straightforward. For instance, the asymptotic 
efficiency of the estimator /Sq is technically challenging in the heteroscedastic case and its 
study is beyond the scope of this paper. 

Remark 2 So far, we have assumed that d = 1. This assumption can be extended 
without difficulty to the general case where d might be greater than 1. In this case, a 
multivariate smoother will be employed for estimating 0(-). The asymptotic results for the 
parameter estimates of (3 and 6 remain unchanged, except that the rate of convergence for 
the link estimate of 0(-) changes with the dimension of d. 

Remark 3 Other dimension reduction approaches, such as MAVE (Xia et al., 2002) and 
other variants of SIR, such as SIR2 (Li, 1991) and SAVE (Cook and Wiseberg, 1991), could 
be employed in Steps 1 and 4 for the case of g = = 1 in (11. 2p . especially when SIR fails 
for the case of symmetric design of X. While MAVE is perhaps the most efficient method 
of all, the benefits over SIR are limited, as all estimates are updated in Stage 2, and it is 
in this step where the major efficiency gains occur. In addition, MAVE is computationally 
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more intensive than SIR and encounters difficulties in estimating (3zi unless the covariate Z 
is one-dimensional and the dimension d of (3z is also small. In fact, the Y^-consistency may 
not hold when d > 3 in (1.2) as shown in Xia, Tong, Li and Zhu (2002). 

Also, SIR2/SAVE was shown in Li and Zhu (2007) to be not y^-consistent, unless a 
bias correction is performed. In contrast, either SIR or pHd (Li, 1992) can be employed to 
identify the directions when d> 1 and = 1, and both lead to i/n-consistency. 

Remark 4 When the dimension g of Z is greater than 1, a multivariate extension of 
SIR (Li et al., 2003) can be employed conceptually in Step 1 of the algorithm. However, 
the number of observations per slice may become sparse, so we recommend an alternative 
multivariate approach as in Li, Wen and Zhu (2008) or Zhu, Zhu, Ferre and Wang (2008) in 
Step 1. 

Remark 5 The single-index assumption in (11.11) can be easily extended to multiple 
indices through SIR or its variants, but the estimation of the multivariate link function g 
would encounter the curse of high dimensionality. Since no more than three indices will be 
needed in many applications, the approach in this paper can indeed be extended in practice 
to multiple indices. 

2.2 Main results 

In this section, the ^/n asymptotics for initial estimates of (3q and in Stage 1 are 
taken for granted as they follow from existing results, so we do not formally list the needed 
assumptions for this to hold but have provided sources after Theorem 1 below. However, 
the asymptotics for the initial estimate of g and each of the parametric and nonaprametric 
estimates in Stage 2 are fully developed in Section 2.2 with detailed assumptions listed for 
each estimator. 

In order to study the asymptotic behavior of the estimators, we list the following condi- 
tions: 
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CI. (i) The distribution of X has a compact support set A. 

(ii) The density function of X'^/S is positive and satisfies a Lipschitz condition of order 

1 for /5 in a neighborhood of Pq. Further, X^Pq has a positive and bounded density 

function f(t) on T, where T = {t = x'^Po : x E A}. 
C2. (i) The functions g and g2i have two bounded and continuous derivatives, where g2i is 

the ith component of g2{t) , 1 < i < q; 

(ii) gsj satisfies a Lipschitz condition of order 1, where g^j is the jth component of gsit), 
and gs{t) = E{X\X^Po = t),l<j< p. 
C3. (i) The kernel i^' is a bounded, continuous and symmetric probabihty density function, 
satisfying 

/oo roo 
u^K{u)du 7^ 0, / \u\^K{u)du < oo; 
-oo J — oo 

(ii) K satisfies a Lipschitz condition on R^. 

C4. (i) suptE{\\Zf\X^Po = t) <oo; 

(ii) E{e) = 0, Var(e) = < oo, E{e^) < oo. 

C5. (i) nh'^/log^n oo, hmsupn/i^ < oo; 

n— >oo 

(ii) nhh\/ n oo, nh'^ 0, hmsupn/if < oo. 

n— >oo 

C6. (i) S = Cov(Z — E{Z\X^ Po)) is a positive definite matrix; 

(ii) V = E\g' {X'^ PqY J^,^^X X^ J {,■)] is a positive definite matrix, where J (,) is defined 
by dZD). 

Remark 6 The Lipschitz condition and the two derivatives in CI and C2 are standard 
smoothness conditions. C3 is the usual assumption for second-order kernels. CI is used to 
bound the density function of X^P away from zero. This ensures that the denominators 
of g(t; P, 9o) and g'(t; P, 9o) are, with high probabihty, bounded away from for t = x^P, 
X E A and P near Pq. C4 is a necessary condition for the asymptotic normality of an 
estimator. In C5(i), the range of h for the estimators 9 and g is fairly large and contains the 
rate n~^^^ of "optimal" bandwidths. However, when analyzing the asymptotic properties 
of the estimator /3 of /5o, we have to estimate the derivative g' oi g. As is well known, the 
convergence rate of the estimator of g' is slower than that of the estimator of g if the same 
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bandwidth is used. This leads to a slower convergence rate for j3 than -y/n, unless we use a 
kernel of order 3 or undersmoothing to deal with the bias of the estimator. This motivates 
the introduction of another bandwidth hi in C5(ii) to control the variability of the estimator 
of g', and condition C5(ii) for bandwidths h and hi. Chiou and Miiller (1998) also consider 
the use of two bandwidths to construct the estimator of /5 in a relevant model. C6 ensures 
that the limiting variances for the estimators 9 and P exist. 

The following theorems state the asymptotic behavior of the estimators proposed in 
Section 2.1. We first establish the asymptotic efficiency of 9. 

Theorem 1 Suppose that conditions CI, C2(i), C3(i), C4(i), C5(i) and C6(i) hold. 
When \\(3z - f3z\\ = Op{n-^/^) and 00 - /5o|| = Op{n-^/^), we have 

V^{9-9o)^N{0,a^^-'). 

Remark 7 Carroll et al.(1997) give similar results with (3=1 and p = 1 (The case of 
a partially linear model). Theorem [1] generalizes their Theorems 2 and 3. 

In Theorem [U when we start with y^-consistent estimators for (3z and (3o, 9 is consistent 
for 9o with the same asymptotic efficiency as an estimator that we would have obtained had 
we known j3o and g, and thus the oracle property. Numerous examples of ^/n- consistent 
estimators already exist in the literature. For instance, Hall (1989) showed that one can 
obtain a i/n-consistent estimator for jSo using projection pursuit regression. Under the 
linearity condition that is slightly weaker than elliptical symmetry of X, Li (1991), Hsing 
and Carroll (1992) and Zhu and Ng (1995) proved that SIR, proposed by Li (1991), leads to 
a y^-consistent estimator of (3z and of (3o, the latter when Z is not present in (1.1). Li and 
Zhu (2007) further show that, when including a bias-correction and under a condition almost 
equivalent to normality of X, sliced average variance estimation (SAVE, Cook and Weisberg 
1991) performs similarly. We expect the results for /5o to hold when Z is dependent of X, 
provided a good estimator of (3z is available. Under very general regularity conditions and 
for q = 1, Xia, Tong, Li, and Zhu (2002) proposed the minimum average variance estimation 
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(MAVE) and Xia (2006) a refined version of MAVE, and botli metliods can provide -in- 
consistent estimators for the single-index /5o- However, there is no result in the literature 
regarding MAVE when the dimension of Z is larger than 1, and the i/n-consistency needs 
further study when d is larger than or equal to 3, even for univariate Z . Therefore, for 
general theory, SIR may be a good choice for the initial estimators of (5z and /5o- 

Theorem 2 Suppose that conditions C1-C6 hold. If the rth component of (3q is positive, 
we have 

- Po) ^ N{0, a^J V-^QV-ijJ.)), 

where Q = Eig'iX'' (3of3l^,[X - E{X\X^Po)][X - E{X\X^ Po)V3 >r,}, V and J^m are 
defined in condition C6. 

From Hardle et al (1993) and Carroll et al (1997), we can see that the estimator (3 oi (3 
has an asymptotic variance that corresponds to a generalized inverse o"^Q|f where 

Qi = ^ [g'{X^l3,f [X - E{X\X^(3,)\ [x - E{X\X^ 13^)]^^ . 

Note that there may be infinitely many inverse matrices of Qi, but there is a unique gen- 
eralized inverse associated with the Jacobian J„(r). The following theorem shows that the 

Po 

variance-cavariance matrix in Theorem 2 is smaller than cr^Q^f , the variance associated with 
J„(r), in the sense that cr^Q^f — cr^J„(r) V^-'^QV^-'^J'^,^) is a non-negative definite matrix. We 

Po Po 

use the usual notation: for two non-negative matrices A and B, A > B denotes that A — B 
is a non-negative definite matrix. 

Theorem 3 Under the conditions of Theorem 2, we have 

i) there is a generalized inverse of Qi that is of the form J^(^)Q~^J^(r); 

ii) JT Q-iJ M > J mV-iQV-ijt . 

Po Po Po Po 

Remark 8 Theorem [3] shows that our estimator of (3o is asymptotically more efficient 
than those of Hardle et al.(1993) and of Carroll et al. (1997). In addition, Carroll et al.(1997) 
use an iterated procedure to estimate /3o and 9q, while our estimation procedure does not 
require iteration. 
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From Theorem [21 we obtain an asymptotic result regarding the angle between /3 and (3q, 
which can be used to study issues of sufficient dimension reduction (SDR). We refer to Cook 
(1998, 2007) for more details. 

Corollary 1 Suppose that the conditions of Theorem 2 hold. Then 

cos(/3,/3o)-l=Op(n-i/2)^ 

where cos(/3, /3o) is the cosine of the angle between (3 and Pq. 

The next two theorems provide the convergence rate of the estimator g*{-) of g{-) and 
the asymptotic normality of the estimator of o"^. 

Theorem 4 Suppose that the conditions of Theorem 1 hold. If \\P — Pq\\ = Op{n^^^^) . 
Then 

sup \g*{x'^P) - gix^'PoM = Op{{nh/\ognr'/'), 
where An = : {x, P) e A x BP, \\(3 - (3q\\ < cn'^^"^} for a constant c> 0. 

Theorem 5 Suppose that conditions C1-C6 hold and < Var(e^) < oo. Then 

M^'-a')/{VaT{el)y/'^N{OA). 

Note that n-^Z^Z S in Lemma A. 5 of the Appendix. By Theorems [T] and H] , we 
obtain 

{Z^ZY/\9-9o)/a^NiO,l,). 

We are now in the position to construct confidence regions for 6q. From Theorem 10. 2d 
in Arnold (1981) we obtain the following result. 

Theorem 6 Under the conditions of Theorem\^ we have 

where xl is chi-square distributed with q degrees of freedom. Let ^^(1 — a) be the (1 — a)- 
quantile of Xq for < a < 1, an asymptotic confidence region of 6q is 

R^ = {d:{e- e)T(ZTZ)(^ - d)/a' < xj(l - a)}. 
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To construct confidence regions for Pq, a plug- in estimator of the limiting variance of /3 
is needed. We respectively define the following estimators V and Q of V and Q by 

1 " 



and 



1 " 

Q = - E g'iXjP; A efjl^, [X, - ^3(X7/3; /3)] [X, - ^3(^7/3; /3)]^J^,.), 



where g^i^t; P) = Y17=i'^ni(t; P)Xi is the estimator of g3{t) = E{X\X^Pq = t) and J^(,.) is 
the estimator of J^(r). It is easy to prove that J3(r) — ^ 3 ^(r), V — > V and Q — > Q. Then 
for any p x Z matrix A of full rank with / < p, Theorems [2] and [5] imply that 



(n-iATj.(., A)-i/^AT(/3 - /5o)/a ^ iV(0, 1,). 



We again use Theorem 10. 2d in Arnold (1981) to obtain the following limiting distribution. 
Theorem 7 Suppose that the conditions of Theorem hold. Then 

The asymptotic confidence region of A^Pq is, letting xF(l — be the (1 — a)-quantile of xf 
for < a < 1, 

= {A^(3 : - /3)^A(n-^ATj^„V-iQV-ijT„A)-^AT(/3 - (3)/a' < x?(l - a)}. 



3 Simulation study 

In this section, we examine the performance of the procedures in Section 2, for the 
estimation of both Pq and Oq. We report the accuracy of estimators using PPR and SIR as 
dimension-reduction methods. The sample size for the simulated data is n = 100 and the 
number of simulated samples is 2000 for the parametric components. When SIR is applied, 
using 5 or 10 elements per slice generally yields good results. In other words, each slice 
contains 10 to 20 points. A quadratic model of the form 

Y = {X^Po - 0.5)2 ^ ^ Q_2e, 
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was used, where 6*0 = 1 is a scalar, Pq = (0.75, 0.5, —0.25, —0.25, 0.25)"^, X is a 5- dimensional 
vector with independent uniform [0,1] components, and e is a standard normal variable. 
The dependency between X and Z was prescribed by defining Z as a binary variable with 
probability exp(/52X)/(l + exp{X^ Pz)) to be 1 and otherwise. Two extreme cases of 
Pz are reported in Table 1 and Table 2, one based n choosing the same value as i3q with 
Pz = Po, and the other on Pz = (0.5,0,0.5,0.5, —0.5)'^, so that Pz is orthogonal to Po- We 
also checked scenarios where Pz and Pq are neither orthogonal nor parallel to each other, 
and the results are in agreement with the two extreme cases reported here. 

For the smoothing steps, we used a local linear smoother with a Gaussian kernel through- 
out. A product Gaussian kernel was used when bivariate smoothing was involved and equal 
bandwidths were selected for each kernel to save computing time. A pilot study revealed 
that the bandwidth chosen at the first stage to estimate the residual rj has little effect on 
the accuracy of the final estimates of ^o, so we choose an initial bandwidth of 0.5 to estimate 
in (11.21) . as this value was frequently selected by generalized cross validation (GCV). The 
subsequent smoothing steps utilized the GCV method as proposed in Craven and Wahba 
(1979). For instance, when estimating g and 9o in the second stage, the GCV statistic is 
given by the formula 

I . ... 

GCV{h) = - ^(F. - Zj9 - h{Xjp- /3, ef/{n-hT{l - S,))^ (3.1) 
^ i=i 

where gh{-) is the estimator of g{-) with a bandwidth h and Sh is the smoothing matrix 
corresponding to a bandwidth of h. The GCV bandwidth was selected to minimize ( 13. ip . 
We use the optimal bandwidth, /lopt, for g and 9. When calculating the estimator /3, we 
chose the bandwidths, 

h = K^,n^/''n-^/'' = K^,n-^'^'> and k = k^u (3.2) 

respectively, because this guarantees that the required bandwidth has the correct order of 
magnitude for optimal asymptotic performance [see Carroll al.(1997), Stute and Zhu (2005), 
and Zhu and Ng (2003)]. Note that choices (13. 2p satisfy condition C5(ii). Relevant discussion 
on choosing two distinct bandwidths can be found in Chiou and Miiller (1998). 
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In the simulation, PPR and SIR were used to obtain the initial estimators of (3o and 
Pz- The notation SIRc means that when we used SIR to estimate Pz, the number of data 
points per slice is c. The resulting estimates for and the one-step iterated estimates are 
summarized in Tables 1 and 2, where we report bias, standard deviation (SD), and mean 
square error (MSE). The case with known Pq is also reported in the last row and serves as 
a gold standard. The right columns under "One-step iterated estimate" in Tables 1 and 2 
represent the results obtained when iterating the algorithms in Section 2.1 one more time 
after obtaining the estimates in the left columns. 

Tables 1 and 2 are about here 

From Tables 1 and 2 we find that the three methods have small mean square errors with 
projection pursuit regression outperforming both SIR procedures. This is expected, as the 
simulated model structure satisfies the additive assumption of PPR and the estimates of the 
/^-directions were iteratively updated through estimates of the unknown link functions, and 
g. In other non-additive situations, SIR might be more reliable than PPR. Iterated estimates 
improved the results for all cases and markedly so for the orthogonal case. Compared to 
the case when Po is known, PPR typically attains 80% or more of the efficiency after one 
iteration. 

For the estimation of Po, we computed the angle (in radians) between P and Pq as a 
measure of accuracy. The mean, standard deviation (SD), and mean squared error (MSE) 
of the angle between P and Pq are reported in Table 3. Here, PPR leads to by far superior 
estimates compared to SIR. 

Table 3 is about here 

The performance of the nonpar ametric estimates for g is demonstrated in Figure 1. Again, 
GCV was used for bandwidth choice and compared to the estimates based on the optimal 
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fixed bandwidth. The true function g and the mean of each estimated g^-function over the 
2000 rephcates are plotted. In general, GCV seems to work well for all parametric and 
nonparametric components. This is consistent with the results reported in Chen and Shiau 
(1994) for the analysis of partially linear models based on generahzed cross validation (GCV). 
Theoretical properties of the current models in regard to GCV will be a topic for further 
investigation. 



Figure 1 is about here 



A final remark is that we tried to compare our procedure with that proposed in Carroll, et 
al. (1997), for the quadratic model used in the above simulations with j3z and {3^ orthogonal. 
However, we were not able to obtain any results for the method in Carroll et al. (1997), as 
their procedure seems to be very sensitive to the choice of the initial estimates. We then used 
our estimates for /3o and 9q as the initial values for their procedure. Nevertheless, we were still 
unable to obtain any meaningful comparison results as out of the seven attempted trials their 
procedure crashed six times on the first simulation and once on the second simulation. Since 
6*0 is only a scalar, we postulate that their procedure has difficulties with high dimensional 
/3o, which is here a five-dimensional vector. 



4 Data Example 

We analyze the Boston Housing data mentioned in Section 1. The goal is to determine the 
effect of the various variables on housing price, including a binary variable, which describes 
whether the census tract borders the Charles River. According to Harrison and Rubinfeld 
(1978), bordering the river should have a positive effect on the median housing price of the 

census tract. They used a linear model that included a log transformation for the response 
variable and three of the covariates, and power transformations for three other covariates. 
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Their final model is 

\og{MV) = oi + a2RM'^ + a^AGE + 04 \og{DIS) + 05 \og{RAD) + qqTAX 
+ ajPTRATIO + as{B - 0.63)^ + Og \og{LSTAT) + awCRIM 
+ auZN + auINDUS + ai^CHAS + ai^NOXP + e. 

The coefficient is estimated to be 0.088, which is significant with a p-value of less than 
0.01 for the hypothesis Hq : 013 = versus Hi : aia ^ 0. The coefficient of determination 
R^ attained by their analysis is 0.81, where i?^ is the squared correlation between the true 
dimension-reduction variable X^/3o and the estimated dimension- reduction variable X'^Po- 

This data set was also analyzed by Chen and Li (1998), who used sliced inverse regression 
with all thirteen covariates. After examining the initial results, Chen and Li (1998) trimmed 
the data and then dropped some of the variables. We fit the data on the first SIR direction 
of the initial analysis reported in their article and obtained an R"^ of 0.705 using GCV 
bandwidth 0.43. Note that the assumptions of sliced inverse regression are probably not met 
because some of the covariates are discrete. We thus proposed to use a partial-linear single- 
index model. Several choices of Z were attempted, but they did not yield better results, in 
terms of R^, than the one using only the Charles River variable as Z and the other covariates 
as X. We thus focus on this model, where a log transformation was apphed on Y. 

To select the number of observations per slice in the dimension reduction step of SIR, we 
borrow our experience in the simulation presented in Section 3, where 5 or 10 observations 
per slice worked well for a total sample size of 100, leading to about 20 to 10 slices. Since 
the sample size for the housing data is much larger, we use SIR with 20 data points per slice 
and this leads to a total of 26 slices. As Chen and Li (1998) pointed out, SIR is not sensitive 
to the choice of slice number, and they tried slicing with 10 or 30 points per shce leading to 
17 or 50 slices, and obtain very similar results. The GCV bandwidth for estimating g and 
6 is 0.367, which is smaller than the bandwidth 0.43 chosen by the GCV method for the 
SIR approach of Chen and Li (1998). To estimate (3 by (2.10), the bandwidths selected by 
(3.2) for /i = 0.16 and for hi is 0.367. The R^ is 0.8047, which is essentially equal to that 
obtained by Harrison and Rubinfeld and higher than that using SIR on all thirteen variables. 
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The value of the test statistic for iJo • ^ = versus Hi : 6 ^ is 3.389 when the degrees 
of freedom are calculated according to Hastie and Tibshirani, and 3.419 when n degrees of 
freedom are used. Either way the result is significant with p-value < 0.01. 

We also omitted the Charles River variable and used a dimension-reduction model on Y 
and X. After obtaining an estimate for j3o, we then estimate the relationship between Y 
and X'^p. GCV yields a bandwidth of 0.16, and we obtain = 0.8021. Even though the 
Charles River variable is significant, its inclusion leads to only a minor increase in R^. 

Figure 2 is about here 

Figure 2 shows the estimated g along with the data. On the x-axis of the above graph, 
the estimated value x'^P is given, and on the y-axis, the estimated value g*{t). Figure 2 
shows a downward trend in the effective dimension reduction (EDR) variate obtained. The 
upward curvature of the function at high values of the EDR variate may or may not be a 
real effect. 

The advantage of our procedure over the one used by Harrison and Rubinfeld is that 
Harrison and Rubinfeld have to make choices regarding transformations for every variable 
in the model. We only need to choose the bandwidth or bandwidths used for smoothing. 

5 Proofs of Theorems 

Since the proofs of the theorems are rather long, the proofs of Theorems 1-4 are presented 
in this section, and more details of the proofs are divided into Lemmas A.2-A.7 in the 
Appendix. 

In this section and the Appendix, we use c > to represent any constant which may take 
different values for each appearance, and a Ab = min(a, b). 

Proof of Theorem [1]. Denote 

G = (gix^Po) - g{x^Po; Po, ^o), . • • , g{x^Po) - g{x^Po; Po, Oo)f. 
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From (12. 6p we have 



Lemma A. 5 in the Appendix imphes 

n(Z^Z)"i (5.1) 
Therefore, Lemma A. 6 in the Appendix leads to 

^{z^zy'z^G ^ 0. 

It remains to show that 

V^{Z^Z)-'Z^e N{0, (T^S-^). (5.2) 

Since 

n n 

=: Ml + Ma. 

The central limit theorem implies n~^l'^M\ — ^ A^(0, S). Similarly to the proof of (A. 17), 
it is easy to obtain that rr^l'^M2 — ^ 0. This together with (15.11) and Slutsky's Theorem 
proves (15.21) . and hence Theorem 1. |--| 

Proof of Theorem [21 The proof is divided into two steps: From (12.90 . step (I) provides 
the existence of the least squares estimator /3 of /3o, and from (13. ip . step (II) proves the 
asymptotic normality of /3. 

(I) Proof of existence. We prove the following fact: Under conditions C1-C5 and 
with probability one there exists an estimator of /?o minimizing expression (12. 9p in 
where B\ri = {(3 '■ \\/3 — /Soil = Biu'^^"^} for some constant such that < Bi < oo. 
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In fact, let Y = (Fi, . . . , and Z = (Zi, . . . , Z„)T. We have 
D{P) = {Y- Zefil - Sf^ni - S^)(Y - Z9) 

= (Y - zoofii - s^)T(i - s^)(Y - zeo) 

- 2(Y - ZOofil - Spfil - Sf,)Zi9 - 9o) 

+ {z{e - e^)Y{\ - s^)T(i - s^)z(^ - ^^o) 

=: Di(/3)-D2(/3) + Z^3(/3). 

The same arguments as in the proof of Theorem 1 can be used to obtain that D^i^^) = 
Rq + op{l) and D^{f3) = op(l), where Rq is a constant independent of f3. This imphes 
D{P) = Di{P) — Rq + 0p{1). Thus, minimizing D{P) simuhaneously with respect to P is very- 
much hke separately minimizing Di{f3) with respect to (3. It follows from (12. 7p that we only 
need to prove the existence of an estimator of (3^'^ in i32„, where = {(3^'^^ '■ \\l3^'^'^ ~ = 
B2n^^/'^} for some constant such that < i?2 < c>o. Since R{(3^'^^) = {~^)^§^^, where 
R{P^^^) is defined in (A. 19) of Lemma A. 7. For an arbitrary P^^'^ G i32„ with the value of 
constant B2 in to be determined, we have from Lemma A. 7 below that 

= - /3![Yui/3![^) - - /5f y V(/?« - + op(l). (5.3) 

The following arguments are similar to those used by Weisberg and Welsh (1994), which 
in turn use (6.3.4) of Ortega and Rheinboldt (1973). We note that term (15. 3p is dominated 
by the term ~ E| because Vnll/^^") - = B2, whereas K/?^'^) - /3f ^[/(/^J"^)! = B20p{l) 
and n(/5M -/3j''yV(/3M -/3S''^) ~ So, for any given r/ > 0, if B2 is chosen large enough, 
then it will follows that (/5('^) - pi'^fRiP^'^) < on an event with probability 1 — rj. From 
the arbitrariness of rj, we can prove the existence of the least squares estimator of (3q^^ in B2n 
as in the proof of Theorem 5.1 of Welsh (1989). The details are omitted. 

(II) Proof of asymptotic normality. From step (I) we find that f]^^"^ is a solution in 
B2n to the equation R{f3^'^^) = 0. That is, i?(/3^''^) = 0. By Lemma A. 7, we have 

= U{pI>'^) - nV(/3(^) - p!^'^) + op(v^ ), 
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and hence 

_ pi^)) = V-^n-'/^U{Po) + op(l). 
We now consider the estimator /3. A simple calculation yields 



2Vi - m 



Ml 



and hence 



- 1 



1 - 



r 112 



^M||2 



Op(n-V2), 



i-||/5wp + Vi- 



r)l|2 



It follows from (12. 7p and the above equation, that 



Pi 



\ ( 



1 - 

Pr+l 



r 112 



01 



Ao{r-l) 



1 - 

Ao{r+l) 



V Ap I \ Pop I 

That is, from the definition of J„(r) of (12.81) 



Pi -Pi 



•01 



Pr-1 — Po{r-l) 



Pr+l — Po{. 



r+1) 



^ Pp ~ Pop J 



Thus, we have 



P-Po = J,ir^{P^'-^-Pl[^) + Op{n-'). 



MP - Po) = V-in-V2f^(/5f )) + op(l). 

Pq 



Opin 



Theorem 2 follows from this. Central Limit Theorem and Slutsky's Theorem. 
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Proof of Theorem [Si Recalling the definition of Q, we can see that Q = j;^'„QiJ.M. 



Define 



We now prove that XIq is a generalized inverse of Qi. To this end, we need to prove that 
XIoQiIIo = Ho and QiIIoQi = Qi. Note that 



Po Po '^O Po '^0 Po 

We now prove QiIIoQi = Qi. First, by QR decomposition (see, e.g. Gentle 1998, Section 
3.2.2, pages 95-97 for more details) for (3q, we can find its orthogonal complement such that 



, 

B = {bi, Po) is an orthogonal matrix, and /3o = B | | . Thus, J 



BB^J„{r) =: BR 



where R 



Ri 
Rs 



with Ri being a (p — 1) x (p — 1) nonsigular matrix. Further, note that 



Q = j'^(r)CliJ ^{r) = R^B^QiBR 

Po Pq 



R 



T 



V 



blQ^h 




R 



Rf6]^Qi6iRi. 



, Ri , 

To prove the result, we rewrite Qi in another form. Define S = B | | . S is a 

1 

nonsingular matrix. Then 



Qi 



(S^)-iS^QiSS-^ = (S 



_IRJ 



1 



Ri 



B^QiB I S"^ 

1/ I 1 

^ R^ \ / bjQibi W Ri 







1 



(S 



T\-l 



Q 
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We now prove that QiIIoQi = Qi that is of the above form. From the above and noting 




Thus, IIo is one of the solutions of Qi . To prove that the asymptotic variance-covariance 
matrix a'^Ui of our estimator is smaller than the corresponding matrix cr^IIo given in Hardle 
et al. (1993), we only need to show that XIq — Hi is a positive semi-definite matrix, that 
is. Ho > Hi. Recall that V = J^,,.E {g'{X^ f^o^XX^} 3 Note that both Q and 
V are positive definite matrices and obviously V > Q. Thus, Q^^ > V^^, and then 
V^^ > V^^QV^^. From these two inequalities, it is easy to see that 

Ho > J.(.)V-ijT(,) > J (.)V-iQV-ijI„ = Hi. 

i\) Pq Po Pq 

The proof is now complete. □ 
Proof of Corollary [H Let • denote the inner product of two vectors. Theorem 2 
implies \\(3 — f3o\\ = Op{n^^/'^) and 

I cos(/3,/3o) - ii = i(/3 - /?o) • ^/m + {\m - wmim i 

< 311/3 -/3o||/||/3||=Op(n-i/2). 

This completes the proof of Corollary 1. |--| 
Proof of Theorem H Denote Qq = (^oi, • • • , OoqV, = {§1, . . . , Ogf. Theorem 1 and 
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Lemma A. 4 in the Appendix yield 

sup \g*{x'^(3) - g{x'^f3o)\ 

9 

<^ sup \g2s{x^P]P) - 92six^Po)\\ds- dos\ 

S = l {x,l3)&An 

+ sup \g{x'^(3;l3,eo) - g{x^l3o)\ 

+ ^sup \g2s{x^Po)\\es - 9os\ = Op{{nh/logn)-'/^), 
s=i ^eA 

and hence Theorem 4 follows. □ 
Proof of Theorem [5l Decomposing into several parts, we have 

in 1 

= - E + - E [ZliOo - e) + g{Xjp,) - g{X^/3; /3, ^o)]' 
^ i=i ^ i=i 

r) n r\ n 

=: h + l2 + h + h. 
Note that v^ll^ ~ ^oll = Op{l) and using (A. 8) of Lemma A. 4, we have 

v^i/2i<-Eii^di'v^ii^-^oir 

+ A/n sup \g{x^ I3q) - g{x^ l3]l3,eo)\^ 

(x,/3)6^„ 

= Op(n-i/2) + Op((n/iVlog'n)-i/2) _^ Q. 

Since Eci = 0, we obtain ^/nIs — ^ 0. Similarly to the proof of (A. 17) in the Appendix, we 
also have ^/nI4 — ^ 0. This proves that = ^ J27=i ^1 + op{n^^^'^). Therefore, we have 

V^{a^-a^) = ^±{el-a')+op{l). 

i=l 

The proof can now be completed by employing the central limit theorem. □ 



APPENDIX 



The following Lemmas A.1-A.7 are needed to prove Theorems 1, 2, 4, 5. Lemma A.l 
gives an important probability inequality and Lemmas A. 2 and A. 3 provide bounds for the 

27 



moments of the relevant estimators. They are used to obtain the rates of convergence for the 
estimators of the nonparametric component, and are used in the proof of Lemmas A.4-A.7. 
Lemma A. 4 presents the uniform rates of convergence in probabiUty for the estimators g, 
g2s and g' . These results are very useful for the nonparametric estimations. The proof of 
Lemma A.5-A.7, as well as Theorem 3 and 4, rely on Lemma A.4. To simplify the proof 
of Theorem 1, we divide the main steps of the proofs into Lemmas A. 5 and A. 6. Lemma 
A. 5 is used to obtain the limiting variance of the estimator 6, and Lemma A. 6 together with 
Lemma A. 5 shows that the rate of convergence of the nonlinear section of Cj for 9 — 9q is 
op{n~^/^). Lemma A. 7 provides the main step for the proof of Theorem 2. 

Lemma A.l Let ^i{x, (3), . . . ,^n{x, (3) be a sequence of random variables. Denote 
fx,i3{Vi) — ^i{x,P) fori — l,...,n, where Vi, . . . ,Vn be a sequence of random variables, 
O'lT'd fx,p is a function on An, where An = {(a;, /3) : (a;,/3) e A x BP., — (5q\\ < cn~^^'^} for 
a constant c > 0. Assume that fx,/3 satisfies 

1 



^ 1=1 

for some constants x*, (3*, a > and c > 0. Let > depend only on n. If 

P 



1 " 



for {x, P) e An, then we have 

1 



sup 



>^en 



<c,n^-s--^( sup 2exp( T^g/fM All, (A.3) 

where ci> Q is a constant. 

Proof. Let {^((x, /3), . . . , ^^(x, /3)} be an independent version of /3), . . . , ^n{x, (3)}. 

Now generate independent sign random variables Ui, . . . , (j„ for which P{(7j = 1} = P{(Ti — 

— 1} = -, and {oi, 1 < i < n} independent of {S,i{x, j3),^[{x, j3), 1 < i < n}. By symmetry, 
~ CO the same distribution as (^j — C). The symmetrization Lemma in Pollard 
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(1984) implies 



P < sup 



1 " 



n 



> Sr 



< 2P i sup 

[ix,l3)£An 

= 2P I sup 

< 4P J sup 

{{x,/3)eAn 



1 



^ 1=1 



1 



n 



i=l 



1 



> -Sr. 



(A.4) 



Let Pn be the empirical measure that puts equal mass — at each of the n observations 

n 

Vi,. . . ,Vn. Let — {fx,p{-) ■ \\x\\ < C, < S} be a class of functions indexed by x 

and P consisting of /x,/3(Vi) = $,i{x,(3). Denote V = (Vi, . . . , Ki). Given V, choose function 
fi-i ■ ■ ■ 1 fm-i in ^1 such that 

1 " 

j€{l,...,m} n ^^-^ 

for each /j,,^ in J^. Let N{en, Pn,^) be the minimum m for all sets that satisfies (A. 5). 
Denote /*^ for the f° at which the minimum is achieved, we then have 





1 " 




1 




< sup 








[{x,l3)eAn 







= P < sup 

[(a;,/3)e-4„ 

<p\ sup 



E^i/^>/9(^i) 



1 " 

-E^^/:,/3(^0 



1=1 



< N(en, Pn, T) max P 



E^^/;(^^) 



^^=1 



(A.6) 



Now we need to determine the order of Nisn-, Pn-, For each set satisfying (A. 5), each /° 
has a pair {xj,Pj) such that fj{v) = fxj,j3j{v). Then for all {x,P) G An, we have from (A.l) 
that 



1 



n 



Next, we want to bound the right-hand side of the above formula by Thus for each 
{x,(3) e An, we need a pair {xj,(3j) within radius r„ = 0{n~°'Sn) of (a;, Therefore, the 
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number N needed to satisfy (A. 5) is bounded by r^^r^^ — cn^^"£„^^, i.e. 



(A.7) 



Now conditioning on V, crif°{Vi) is bounded. Hoeffding's inequality [(Hoeffding (1963)] yields 



P 



n 



i=l 



This together with (A.4), (A.6) and (A.7) proves (A.3). □ 
Lemma A. 2 Suppose that conditions CI, C2 and C3(i) hold. If h = en'"" for any 
< a < 1/2 and some constants c > 0, then, for i — 1, ■ ■ ■ ,n, we have 

2 



E 



E 



E 



g'iXjPo) -YWnjiXjPo;M9{Xjf3o) 



0{hl) 



and 



E 



T 2 



J2Wni{Xj(3o;PoMXjf3o) - v{Xj(5,) 
i=i 



where i^{t) — g'{t)gss{t) and g^g is the sth component of g3{t) — E{X\X'^(3q — t). 
Proof. See Lemma 1 of Zhu and Xue (2006). 
Lemma A.3 Under the assumptions of Lemma A. 2, we have 



□ 



and 



E{Wl^Xjl3^-p^)}^0{{nh)-^), 

e\ E Wl^{XjP,-pA=0{{nh)-'), 



E{Y:KM^(5-(5))^0{{nhr') 



e\ E W^j{X:f(3o;Po)]^0{inhir'). 
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Proof. See Lemma 2 of Zhu and Xue (2006). 

Lemma A. 4 Suppose that conditions C1-C4 and C5(i) hold. We then have 



□ 



and 



sup \g{x'l3^) - gix'^P; /3, 9o)\ = Op{{nh/ logn)-^^) 



sup \g2s{x^Po)-92s{x^P;P)\ = Op{{nh/logn)-'/^). 



If in addition, C5(ii) also holds, then we have 



sup \g'{x'^/3o)-g'{x'^P;P,9o)\ = Op((n/i?/ logn)-^^), 



(A.8) 



(A.9) 



(A.IO) 



where An = {{x, (3) : (x, /?) e A x EP, - /?o|| < cn"^/^} for a constant c> 0. 

Proof. We only prove (A.8), the proofs for (A.9) and (A.IO) are similar. Write g{Xi, e^) = 
g{x'^Po) - g{X^Po) - ej, i = 1, . . . , n. We have 



(All) 



i=l 



Let ii{x,(5) = n{nh/\ognf'^Wni{x'^(5-(5yg{Xi,ei), U,p{Vi) = Ci{x,P), - (X,,eO, i = 
1, . . . ,n. Using lemma A.l, we have to verify (A.l) and (A. 2). A simple calculation yields 
(A.l), so we now verify (A. 2). By lemmas A. 2 and A. 3, and noting that sup(3.^^)g_4^ \g{x'^P) — 
g{x^(3o)\ = 0(n"-^/^), we have 

l2 



E[g{x''f3o)-g{x''f3;P,eo)f^E 



J2Wni{x'^P;P)g{Xi,ei) 



Li=l 



< cE 



g(x'^(3)-Y,Wni(x'^(3;(3)g(X7(3) 



1=1 



+ cE}^Y.W^,{x''P;(3)^ + 0{n-') 
< ch^ + c{nh)-\ 

Given a M > 0, by Chevbychev's inequality and (A. 12), we have 

1 



(A12) 



n 



1=1 



> -M} < AM-'^E 
2 I - 



-Eei(^,/3) 



1 2 



n 



i=l 



< AM-^nh{logny^E 



1 2 



J2Wni{x'^f3;f3)g{Xi,ei) 



u=i 



< cM-^{cnh^ + c(logn)-^). 



(A. 13) 
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Therefore, from C5(i), we can choose M large enough so that the right hand side of (A. 13) 
is less than or equal to ^. Hence, (A. 2) is satisfied. We now can use (A. 3) of Lemma A.l to 
get (A. 8). By Lemma A. 3, we obtain 

n n 2 

n-^Y.Eei{x,l3) = nh{\ogn)-^Y.E ^ni{x^ P', (^Yg{Xi,ei)\ 

i=l 1=1 

n 



This imphes that n ^f{x, (5) — Op((logn) ). Hence, from Lemma A.l we have 



P < sup 



1 " 11 

IT' ,_i ^ I 



The right-hand side of the above formula tends to zero when M is large enough. Therefore, 
(A. 8) follows. □ 
Lemma A. 5 Under the assumptions of Theorem 1, we have 



where S is defined in condition C6. 

Proof. Noting that Z = (I — S)Z, the (i, s) element of Z is 

Z^s = [Z^s - 92s{Xjl3o)] + [92s{Xjl3o) - ^2.(XT/3o;/3o)]. 
The (s, t) element of Z'^Z is 

n n 

^ ZiJu = E [Zis - 92s{X^Po)] [Zit - g2t{X^Po)] 

1=1 i=l 

n 

+ E [Zis - 92s{X^Po)] [92t{XjPo) - 92t{Xjk- /3o)] 

i=l 

n 

+ E [Zit - 92t{Xj(5o)] [92s{Xj(5^) - g2s{X^Po; Po)] 

i=l 

n 

+ E [92s{xJf3o) - g2s{X^Po; Po)] [g2t{x^f3o) - g2t{X^Po; Po)] 

i=l 

=:7i + 72 + ^3 + h- 
By the law of large numbers, we have 

n-^I, E{[Z^, - E{Z,,\X^Po)][Zit - E{Zu\X^Po)]} =: ^st, 



(A.14) 



(A.15) 
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where Hgt is the (s, t) element of S. Noting that 



-J2\Zis- g2s{xJ(5o)\ E\Zu - g2s{XjPo)\ < OO, 



n 



i=l 



this together with (A. 9) of Lemma A. 4 proves that 



n-%<Op(l) sup \g2tix^f3o)-g2t{x'^P;P)\^0. 



Similarly, we can prove n ^I^ — ^ and n — ^ 0. This together with (A. 14) and 



It ^ 



(A. 15) proves Lemma A. 5. 

Lemma A. 6 Under the assumptions of Theorem 1, we have 



i=l 

Proof. The sth component of Z^G is 

n 

Y,Zis[g{Xj(5o)-g{Xj^Po\M)] 
1=1 

= [Z,, - g2s{X^f3o)] [g{X^f3o) - giXj^o; /3o, ^0)] 

i=l 

n 

+ E [92s{X^Po) - 92s{XjPo; Po)][g{X^f3o) - giX^Po; Po, 9o)] 

i=l 

=: Ji + J2- 

For J2, from (A. 8) and (A. 9) of Lemma A. 4 we have 

n''^^^\J2\ <Vn sup \g2s{x'^ Po) - g2s{x'^ P)\ 

X sup \g{x'^Po)-9{x'^P;P,Oo)\^Op{{nhyiog^n)-^). 

Noting that nh'^/log^n — > 00, we obtain n~'^/^J2 0. It remains to prove that 

^-1/2 _^ 0, 



□ 



(A.16) 



(A.17) 



as this together with (A.16) implies Lemma A. 6. To prove (A.17) , we only need to show 
that 



sup 



1 



n 



E MZis - g2s{Xj (3^)MXj (5,) - g{Xj(5, (5)] 
1=1 
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0, (A.18) 



where B'^ — {P : \\P — Po\\ < cn ^/^} for a constant c > 0. Toward this goal, we note that 
Lemma A.l can be used when the variable x is removed. Let 

fpiVi) = W): = (Xh Zis: Ci), i = 1, . . . , n. 

We now verify that (A.l) and (A. 2) are satisfied. By the condition C3(ii) on the kernel 
function, we calculate that 

- E MVi) - MVi)\ < cn'/'h-'\\P - = cn'^WP - P*\ 



n 



i=l 



where a = | + 2A(| < A < |). Hence, (A.l) is satisfied. 

We next verify that (A. 2) is satisfied. Denote Q = Z^g — g2s{Xjl3o). Prom condition C4, 
Lemmas A. 2 and A3, we have 



E 



n 



< 2n-^ ^ E < 



1=1 



g{Xj(5o) -Y.Wr,^{Xj(5;(5)g{Xj(5o) 



E{Cl\XjP,) 



+ 2ri-' E E E E ^[^ni(^7/5; mmiX'^P-, miCkeM 

i j k I 

{n 
EWUXJ(5-, /?) + E EWl^{Xj(5- (5) 
i=l i^j 

< ch^ + cn-^ + c{nh)-^ — > 0. 



Hence, we can obtain 



when n large enough. Therefore, (A. 2) is satisfied. By (A. 8) of Lemma A.4, we have 
^ Ee-(/3) = 0p(l) sup [g{x^(3o)-g{x^(3;(3,eo)f = Op{nh/logn)-'). 



i=l (a:,/3)6A 



By using Lemma A.l, we obtain 



P < sup 

[{x,l3)eAn 



n 1 ^ 

> < cn2P"£-2Pexp(-cn/i/logn) — > 0. 
i=i 2 J 



1 

n 
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by nh/ \ogn — > oo. This proves (A. 18) and thus completes the proof of Lemma A. 6. □ 
Lemma A. 7. Suppose that conditions C1-C6 are satisfied, then we have 

sup ||it:(/3W) - U{(5'^^) + nV(/3W - (3'^^)\\ = op(V^), 

where Bn = {/^^'^^ : \\(3^'^ - P'-q^W < Cn-^/'^} for a constant C > 0, V is defined in condition 
C6, 

n 

=Y,[Yi- ZjOo - g{Xj(5- p, eo)]g\Xj(5- p, ^o) Jjo^^, (A.19) 
1=1 

and 

n 

U{(3^o^) =Y.^^g\XJ(3,)il.)[X, - E{X,\Xj (3,)]. 

1=1 

Proof. Separating we have 

=j2e,g'{Xj(3,)il^,[X, - E{X,\Xj P^)] 

i=l 

+ X: e,[g\X^(3; P, Oo) - ^'(X^^/^o)] Jj^X, 

i=l 

n 

- g\Xjp,)il,^)Xi{g{Xjp- P, Oo) - g{X^Po; Po, Oo)} 

i=l 
n 

-Yg'{X^Po)J^^,r,{X,[g{XlPo;Po,Oo)-g{X^Po)]-eMXlPo)} 

i=l 

n 

- Y.[9i^lP-^ ^o) - g{XjPo)][g'{X^P; p, 9o) - ^'(X^/^o)] JJm^, 

i=l 

=: + R2{P^'-^) - RsiP^'^) - R^iP^'^) - i?5(/3(^^ (A.20) 

Noting that J.(.) - J (.) = Op(n-V2) for all e B„, we have 

sup ||i?i(/3W)-C/(/3('-))|| = op(v^). (A.21) 

Since \\P^'^ - P^^^\\ < Cn'^/'^ implies \\P-Po\\ < Cn-^^^ for all /^^ e B„, similar to the proof 
of (A. 17) we can show that 

sup \\R2{P^'^)\\^op{V^). (A.22) 
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For RaiP^^^), by a Taylor expansion of P^^^ - I3^^> with a suitable mean e Bn and 

= weget 

n 

i=l 

n 

i=l 

n 
i=l 

By (A. 10) of Lemma A.4 and the law of large numbers, we obtain that 

sup \\R,,{P^^\p^^^)\\ = op{^) 

and 

sup ||i?32(/3(''\^^^)) - nV(/3(') - /5f ))|| = op(v^). 

Therefore, we have 

sup ||i?3(/3(^)) - nV(/3W - p!['^)\\ = op{V^). (A.23) 

We now consider R^P^'^^). Write i?4(/9('')) = Jj(.)i?^(/3('')). Let R^^ denote the sth 
component of R^P^^^). First, from Lemma A. 2 and A. 3 we have 

n-'E{Rll) < cn-' J^^lfl Wni{XjPo; P^)g'{X] P,)X^, - g' {Xj P^)g^,{Xj P^) 

n I 

1=1 [j=l 

< c{nh)~^ + cv^ + cn/i^ — > 0. 

This imphes 

sup ||i?4(/3^'-))||=op(V^), (A.24) 

and by Lemma A.4 we obtain 

sup \\R,{P^^^)\\^op{Vn). (A.25) 
Substituting (A.21)-(A.25) into (A. 20), we prove Lemma A. 7. □ 
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TABLE 1 

Svmulatwn results for 6 with l3z Qud Br), parallel , \ 7^ ~~ 
KesuItinET estimate One-step iterated estimate 



Bias SD MSE Bias SD MSE 



PPR 


0.0058 


0.0706 


0.00502 


0.0046 


0.0701 


0.00493 


SIRs 


0.0095 


0.0862 


0.00753 


0.0083 


0.0869 


0.00762 


SIRio 


0.0113 


0.0788 


0.00634 


0.0098 


0.0808 


0.00663 


I3q given 


0.0031 


0.0660 


0.00436 









TABLE 2 



.nmlation results for with 6z andj3n orthoqgnal \ 7^ T" 
ResultmGr estiniate Une-step iterated estimate 





Bias 


SD 


MSE 


Bias 


SD 


MSE 


PPR 


-0.0087 


0.0972 


0.00952 


-0.0047 


0.0711 


0.00508 


SIRs 


-0.0115 


0.1395 


0.01960 


-0.0072 


0.0919 


0.00850 


SIRio 


-0.0102 


0.1362 


0.01865 


-0.0083 


0.0959 


0.00926 


given 


-0.0024 


0.0696 


0.00485 









TABLE 3 







mlation results i 
)z and Po parai 


Yry the angles 


hetweet^ 


^i'a'Jfcf^^o^ortho 


igonal 




Mean 


SD 


MSE 


Mean 


SD 


MSE 


PPR 


0.0148 


0.0056 


0.00025 


0.0157 


0.0066 


0.00029 


SIRs 


0.0467 


0.0223 


0.00268 


0.0482 


0.0232 


0.00286 


SIRio 


0.0496 


0.0230 


0.00299 


0.0528 


0.0229 


0.00331 
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Figure 1: Curve estimate for a single replication of the quadratic model simulation study, 
with orthogonal (3z and (3q. The true cure (solid curve), the mean of g* with GCV band- 
width (dashed curve) and a fixed optimal bandwidth hopt = 0.439 (dotted curve) over 2000 
simulations are shown. 
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